A study on lattice rescoring with knowledge scores for automatic speech recognition

نویسندگان

  • Sabato Marco Siniscalchi
  • Jinyu Li
  • Chin-Hui Lee
چکیده

We study lattice rescoring with knowledge scores for automatic speech recognition. Frame-based log likelihood ratio is adopted as a score measure of the goodness-of-fit between a speech segment and the knowledge sources. We evaluate our approach in two different applications: phone recognition, and connected digit continuous recognition. By incorporating knowledge scores obtained from 15 attribute detectors for place and manner of articulation, we reduced phone error rate from 40.52% to 35.16% using monophone models. The error rate can be further reduced to 33.42% for triphone models. The same lattice rescoring algorithm is extended to connected digit recognition using the TIDIGITS database, and without using any digit-specific training data. We observed the digit error rate can be effectively reduced to 4.03% from 4.54% which was obtained with the conventional Viterbi decoding algorithm with no knowledge scores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition

Using context in automatic speech recognition allows the recognition system to dynamically task-adapt and bring gains to a broad variety of use-cases. An important mechanism of contextinclusion is on-the-fly rescoring of hypotheses with contextual language model content available only in real-time. In systems where rescoring occurs on the lattice during its construction as part of beam search d...

متن کامل

On-the-fly lattice rescoring for real-time automatic speech recognition

This paper presents a method for rescoring the speech recognition lattices on-the-fly to increase the word accuracy while preserving low latency of a real-time speech recognition system. In large vocabulary speech recognition systems, pruned and/or lower order n-gram language models are often used in the first-pass of the speech decoder due to the computational complexity. The output word latti...

متن کامل

K Eyword S Potting on W Ord L Attices

In spite of its numerous potential applications, Automatic Speech Recognition (ASR) remains a difficult (and mainly unsolved) problem. In addition to the intrinsic difficulty of the task, users tend to go beyond the pre-defined lexicon words, and the important keywords necessary to understand voice requests are often lost in extra words. In this context, it is often interesting to develop Keywo...

متن کامل

Fuzzy class rescoring: a part-of-speech language model

Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how t...

متن کامل

Knowledge-Based Word Lattice Rescoring in a Dynamic Context

Recent advances in automatic speech recognition (ASR) technology continue to be based heavily on data-driven methods, meaning that the full benefits of such research are often not enjoyed in domains for which there is little training data available. Moreover, tractability is often an issue with these methods when conditioning for long-distance dependencies, entailing that many higher-level know...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006